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Abstract A number of structural genomics/proteomics 
initiatives are focused on bacterial or viral pathogens. In 
this article, we will review the progress of structural pro- 
teomics initiatives targeting the SARS coronavirus (SARS- 
CoV), the etiological agent of the 2003 worldwide epi- 
demic that culminated in approximately 8,000 cases and 
800 deaths. The SARS-CoV genome encodes 28 proteins 
in three distinct classes, many of them with unknown 
function and sharing low similarity to other proteins. The 
structures of 16 SARS-CoV proteins or functional domains 
have been determined to date. Remarkably, eight of these 
16 proteins or functional domains have novel folds, indi- 
cating the uniqueness of the coronavirus proteins. The 
results of SARS-CoV structural proteomics initiatives will 
have several profound biological impacts, including elu- 
cidation of the structure-function relationships of 
coronavirus proteins; identification of targets for the design 
of anti-viral compounds against SARS-CoV and _ other 
coronaviruses; and addition of new protein folds to the fold 
space, with further understanding of the structure—function 
relationships for several new protein families. We discuss 
the use of structural proteomics in response to emerging 
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infectious diseases such as SARS-CoV and to increase 
preparedness against future emerging coronaviruses. 
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Introduction 


One of the central aims of Structural Genomics is to 
determine the structures of proteins with biomedical 
importance, in order to understand the molecular basis of 
these diseases via the proteins involved, and thus to 
improve disease treatment, diagnosis or prevention. A 
number of Structural Genomics initiatives worldwide are 
focused on the structures of proteins related to human 
disease, including various bacterial, protozoan and viral 
pathogens. These include the TB Structural Genomics 
Consortium (http://www.doe-mtb.ucla.edu/TB/), involving 
50 laboratories across 9 countries and aiming to deter- 
mine 400 structures from Mycobacterium tuberculosis. The 
Structural Genomics of Pathogenic Protozoa initiative 
(http://www.sgpp.org/) is targeting the protozoan species 
that cause tropical diseases such as malaria, sleeping 
sickness, leishmaniasis and Chagas’ disease. In Europe, the 
Structural Proteomics IN Europe (SPINE) (http://www. 
spineurope.org/) programme focuses on both bacterial and 
viral pathogens: the former include Bacillus anthracis and 
Mycobacterium tuberculosis, while the latter include pox- 
viruses, herpesviruses and coronaviruses. Also in the area 
of viral pathogens, the focus of the VIZIER project 
(http://www.vizier-europe.org/) is comparative structural 
genomics of viral enzymes involved in replication. The 
specific aim of VIZIER is to identify potential new 
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anti-viral targets against RNA viruses through targeting 
their replication machinery. However, VIZIER does not 
include the SARS virus as part of its sphere of activity. 

In 2003, the emergence of a form of pneumonia called 
severe acute respiratory syndrome (SARS) was attributed 
to a previously unknown coronavirus termed SARS-CoV 
[1, 2, 3, 4]. SARS-CoV was the aetiological agent for a 
worldwide epidemic with approximately 8,000 reported 
cases and 800 deaths, and its emergence was attributed to 
an animal-to-human interspecies transmission [5]. Coro- 
naviruses, characterized as enveloped, positive-stranded 
RNA viruses with the largest known genomes, belong to 
the genus Coronavirus of the family Coronaviridae [6, 7]. 
Approximately 26 species of coronaviruses (CoVs) can be 
classified into three distinct groups on the basis of genome 
sequence and serological reaction [8]. Prior to the outbreak, 
very little attention was paid to the structure—function 
studies of coronavirus proteins by researchers as this genus 
of virus predominantly causes severe diseases in animals 
and comparatively mild diseases in humans. While exten- 
sive research had been carried out on model coronaviruses 
over the previous 20 years or so, little was understood 
about underlying mechanisms such as viral assembly and 
viral replication/transcription prior to the SARS outbreak. 

The SARS-CoV genome is approximately 29,700 nu- 
cleotides and is composed of at least 14 functional open 
reading frames (ORFs) that encode 28 proteins covering 
three classes: two large polyproteins (pp)la and (pp)lab 
that are cleaved into 16 non-structural proteins required 
for viral RNA synthesis (and probably with other func- 
tions); four structural proteins (the S, E, M and N- 
proteins) essential for viral assembly; and eight accessory 
proteins that are thought unimportant in tissue culture but 
may provide a selective advantage in the infected host 
(Table 1, Fig. 1) [9]. Many of the 28 SARS-CoV proteins 
share low sequence similarity with other proteins, 
including those from other viruses, indicating their 
uniqueness and hampering functional assignment based on 
homology. 

In this review, we will focus on the current progress in 
SARS coronavirus (SARS-CoV) structural proteomics 
initiatives and assess their biological impact. In addition 
to several traditional structural biologists, there are cur- 
rently three major international structural proteomics 
initiatives focused on SARS-CoV: in China (our group, 
led by Zihe Rao), USA (The Scripps Research Institute, 
led by Peter Kuhn) and France (University of Marseilles, 
led by Bruno Canard). Other SARS-CoV protein struc- 
tures have been solved by the SPINE consortium led by 
David Stuart. The strategies adopted by the three groups 
are similar: to systematically determine the three-dimen- 
sional structure of each protein encoded by the SARS 
coronavirus in order to elucidate their function and 
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identify potential new therapeutic targets. Drug develop- 
ment strategies targeting SARS-CoV are focused on two 
main avenues: inhibitors to block virus entry into the host 
cells, and compounds to prevent viral replication and 
transcription. The three structural proteomics initiatives 
have focused more specifically on the replication/tran- 
scription machinery formed by the 16 non-structural 
proteins. 


Non-structural proteins 


The SARS-CoV replicase gene encodes 16 non-structural 
proteins (nsps) with multiple enzymatic functions [10]. 
These are known or are predicted to include types of 
enzymes that are common components of the replication 
machinery of plus-strand RNA viruses: an RNA-depen- 
dent RNA polymerase activity (RdRp, nsp12), a 3C-like 
serine protease activity (MP or 3CL?™, nsp5), a papain- 
like protease activity (PL2°"°, nsp3), and a superfamily 1- 
like helicase activity (HEL1, nsp13). In addition, the 
replicase gene encodes proteins that are indicative of 3’-5’ 
exoribonuclease activity (ExoN homolog, nsp14), endo- 
ribonuclase activity (XendoU homolog, nsp15), adenosine 
diphosphate-ribose 1”-phosphatase activity (ADRP, nsp3), 
and ribose 2'-O-methyltransferase activity (2’-O-MT, 
nsp16) [10]. These enzymes are less common in positive- 
strand RNA viruses and may therefore be related to the 
unique properties of coronavirus replication and _ tran- 
scription. Finally, the replicase gene encodes another nine 
proteins, of which little is known about their structure or 
function. Here we detail the available structures of non- 
structural proteins, of which nsp5 is the most widely 
characterized. 


Nsp1l 


The non-structural protein nsp1 is the N-terminal cleavage 
product of the viral replicase polyprotein that mediates 
RNA replication and processing. Nsp1 lacks any viral or 
cellular homologs other than in coronaviruses and its pre- 
cise function remains unknown, although it has been shown 
to specifically accelerate mRNA degradation with a 
reduction in cellular protein synthesis. An NMR structure 
of nsp1 covering residues 13-128 was determined by Kurt 
Wuthrich and colleagues as part of the US structural pro- 
teomics initiative [11] and presents a novel irregular f- 
barrel fold, indicating an unidentified and possibly unique 
biological function (Fig. 1). The full-length nsp1 protein, 
also characterized by Wuthrich and colleagues, has two 
flexibly disordered polypeptide segments from residues |— 
12 and 129-179. 
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Table 1 Summary of SARS proteins 


Protein Protein ORF (location in Putative functional Structure available 
size (a.a.) genome sequence) domain(s) 
Structural proteins 
Spike (S) protein 1255 ORF2 (21492-25259) Yes (fusion core, receptor 
binding domain) 
Envelop (E) protein 76 ORF4 (26117-26347) No 
Membrane 221 ORF5 (26398-27063) No 
(M) protein 
Nucleocapsid 422 ORF9a (28120-29388) Yes (N-terminal RNA binding 
(N) protein domain, C-terminal domain) 
Non-structural proteins (Nsp) 
Nsp1l 180 ORF 1a (265-804) Yes 
Nsp2 638 ORF 1a (805-2718) No 
Nsp3 1922 ORF 1a (2719-8484) Ac, X, PLP", Y (TM1), ADRP Yes (Glu-rich*, ADRP, 
PL?*® domains) 
Nsp4 500 ORF 1a (8485-9984) TM2 No 
Nsp5 306 ORF 1a (9985-10902) MpPre Yes 
Nsp6 290 ORF 1a (10903-11772) T3 No 
Nsp7 83 ORF la (11773-12021) Yes 
Nsp8& 198 ORF 1a (12022-12615) Yes 
Nsp9 113 ORF la (12616-12954) ssRNA binding Yes 
Nsp10 139 ORF la (12955-13371) GFL Yes 
Nspll 13 ORF 1a (13372-13410) No 
Nsp12 932 ORF 1b (13398-16166) RdRp No 
Nsp13 601 ORF 1b (16167-17969) ZD, NTPase, HEL1 No 
Nsp14 527 ORF 1b (17970-19550) Exonuclease (ExoN homolog) No 
Nsp15 346 ORF 1b (19551-20588) NTD, endoRNase (XendoU homolog) Yes 
Nsp16 298 ORF 1b (20589-21482) 2'-O-MT No 
Accessory proteins 
Orf3a 274 ORF3a (25268-26092) No 
Orf3b 154 ORF3b (25689-26153) No 
Orf6 63 ORF6 (26913-27265) No 
Orf7a 122 ORF7a (27273-27641) Ig-like Yes (Luminal domain) 
Orf7b 44 ORF7b (27638-27772) No 
Orf8a 39 ORF8a (27779-27898) No 
Orf8b 84 ORF8b (27864-28118) No 
Orf9b 98 ORF9b (28130-28426) Yes 


* Indicates that a structure has been deposited in the Protein Data Bank but has not been published 


Abbreviations: PL’"°, papain-like protease; ADRP, adenosine diphosphate-ribose 100-phosphatase; TM, transmembrane domain; M?"°, main (or 
3C-like cysteine) protease; GFL, growth factor-like domain; RdRp, RNA-dependent RNA polymerase; ZD, putative Zinc-binding domain; 
HEL1, superfamily 1 helicase; NTD, nidovirus conserved domain; ExoN, 30-to-50 exonuclease; 2'-O-MT, S-adenosylmethionine-dependent 


ribose 2’-O-methyltransferase 


Nsp3 ADRP and PL?” domains 


One limitation of SARS structural proteomics is the dif- 
ficulty in expressing soluble, stable and functional 
proteins. One workaround is to identify the functional 
domains of individual proteins to increase the chance of 
successful structure determination. Such an approach was 
taken in the case of nsp3, which is a large, multidomain 


protein yielded by proteolytic cleavage of the ppla 
polyprotein at two sites by the papain-like protease 
(PL). It is comprised of 1,922 amino acids and features 
conserved sequence motifs for six domains: (1) an N- 
terminal Glu-rich acidic domain; (2) an ‘X’ domain with 
predicted Appr-100-p processing activity; (3) a SUD 
domain (SARS-specific unique domain); (4) a peptidase 
C-16 domain that contains the PL?"°; (5) a transmembrane 
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Fig. 1 Summary of SARS-CoV 
protein structures to date. The 
SARS-CoV genome is shown 
surrounded by the available 
structures of SARS-CoV 
proteins (drawn in ribbon 
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representation): nsp1, nsp3 
(Glu-rich, ADRP and PL?'® 
domains), nsp5, nsp7, nsp8, 
nsp9, nsp10, nsp15, Spike 
protein (receptor binding 
domain and fusion core), N- 4: 2 3 


protein (N-terminal RNA- q 
binding domain and C-terminal 

dimerization domain), orf7a and p itt 
orf9b. Orange and blue triangles DR 
represent PL?"® (nsp3) and M?"® 
(nsp5) cleavage sites, 
respectively. Structures shown 
above the genome (nsp5, nsp7, 
nsp8, nsp10, nsp15, S-protein y 
fusion core) were solved by | ans 
Zihe Rao and colleagues in 
China. Representative structures 
shown below the genome were “S 
solved by other groups. 
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domain; and (6) the ‘Y’ domain. Peter Kuhn and col- 
leagues in Scripps determined the crystal structures of two 
functional domains of nsp3. the ‘X’ or ADP-ribose-1”- 
phosphate dephosphorylation (ADRP) domain [12] and 
the papain-like protease (PL’°) domain [13]. A third 
NMR structure from the Scripps consortium is available 
in the Protein Data Bank for the N-terminal Glu-rich 
acidic domain. The French consortium of Bruno Canard 
and colleagues have also reported a structure—function 
study of the ADRP domain [14]. 

The structure of the ‘X’ domain, also known as the 
ADRP domain, reveals a close structural relationship with 
macro-H2A-like fold proteins (Fig. 1). Furthermore, the 
‘X’ domain shares sequence homology with Poalp from 
Saccharomyces cerevisiae, which is known to be a highly 
specific phosphatase that removes the 1” phosphate group 
of ADP-ribose-1"-phosphate (Appr-1”-p) in the tRNA 
splicing pathway. Using in vitro assays, the authors 
confirm that the nsp3 ‘X’ domain does indeed remove 
the 1” phosphate group of ADP-ribose-1’-phosphate 
(Appr-1"-p). 

The structure of the PL’'® domain of nsp3 was deter- 
mined in 2006 and found to possess a “thumb-palm- 
fingers” fold related to known deubiquitinating enzymes 
(Fig. 1). However, certain key features of nsp3 PL’, 
including a zinc-binding motif and a ubiquitin-like N-ter- 
minal domain, separate it from other characterized 
deubiquitinating enzymes. The availability of the nsp3 
PL?"° structure now provides a clearer understanding of the 
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proteolytic processing at the consensus (LXGG) cleavage 
site and provides details at the molecular level for the 
mechanism of deubiquitination, suggesting an important 
dual role for this enzyme. 

At the time of writing, the structure of a third domain of 
nsp3, the Glu-rich acidic domain, has been deposited in the 
Protein Data Bank with accession number 2GRI yet 
remains unpublished. Determined by the Scripps group 
using NMR, the solution structure has a globular «-helical 
fold (Fig. 1). A DALI search for structural similarity shows 
no significant structural homologs. 


Nsp5, the SARS-CoV main protease 


The replicase polyproteins ppla and pplab undergo 
extensive proteolytic processing by viral proteases to pro- 
duce multiple functional subunits, which are involved in 
formation of the replicase complex to mediate viral repli- 
cation and transcription. The coronavirus main protease 
(MP*°), also known as the 3C-like protease (3CL?"°) after 
the 3C proteases of the Picornaviridae, is a ~33 kDa 
cysteine protease that cleaves the replicase polyprotein at 
11 conserved sites involving canonical Leu-Gln| (Ser, Ala, 
Gly) sequences. The cleavage process is initiated by the 
enzyme’s own autolytic cleavage from ppla and pplab [15, 
16]. Its functional importance in the viral life cycle and the 
lack of closely related cellular homologs makes the M?"° an 
attractive target for the development of drugs directed not 
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Fig. 2 Functional oligomers of SARS-CoV proteins. (A) NspS5, the 
main protease (MP"°). SARS-CoV MP", shown in ribbon represen- 
tation, is active as a dimer. (B) Nsp9, the ssRNA binding protein. 
SARS-CoV nsp9, shown in ribbon representation, functions as a 
dimer. (C) Nsp10, a zine finger protein. SARS-CoV nsp10, shown in 
ribbon representation, can exist as a dodecamer in solution. The active 
form of nsp10 remains to be determined. Zinc ions are shown as grey 


only against SARS, but also against other coronavirus 
infections. 

The crystal structure of SARS-CoV MP?° was deter- 
mined in 2003, mere months after the emergence of the 
epidemic, by our group in Tsinghua University, Beijing 
[17], and by the San Diego-based company Structural 
GenomiX (Fig. 1). Structural analysis confirmed that the 
functional unit of the M?™° is a dimer, with the first seven 
N-terminal residues (called the “N-finger”) important for 
stabilizing the active pocket of the neighbouring monomer 
(Fig. 2A). The availability of the M?"° structures in the 
Protein Data Bank enabled other researchers worldwide to 
design inhibitors targeting this important replication 
enzyme, thus speeding up drug development in case of the 
re-emergence of SARS. Prior to this, homology models 
constructed from the crystal structures of the M? from 
human coronavirus strain 229E (HCoV-229E) and porcine 
transmissible gastroenteritis virus (TGEV) [18, 19], both 
group I coronaviruses, were widely used to design anti- 
SARS inhibitors. However, comparison between the 
SARS-CoV MPF° structure and a homology model 


spheres. (D) Nsp15, the endoribonuclease. Nsp15, shown in ribbon 
representation, is active as a hexamer. (E) The S-protein fusion core. 
The HRI and HR2 peptides together form a six-helix bundle 
characteristic of class I viral fusion proteins. (F) The N-protein 
dimerization domain. The C-terminal domain of the N-protein 
functions as a dimer 


constructed from HCoV-229E and TGEV MP? (PDB ID: 
1P9T) [19] showed a root-mean-square deviation of 3.8 A 
[17]. There have since been widespread reports of various 
strategies used to design inhibitors targeting the SARS- 
CoV MP (see [20] for a review). In 2005, our group 
confirmed that the M?° is significantly conserved among 
all three coronavirus antigenic groups and, moreover, that 
inhibitors designed to target the SARS-CoV MP" can be 
effective ‘broad spectrum’ inhibitors against all coronavi- 
rus MP" [21]. 


Nsp7 and nsp8 


In 2005, our group in Tsinghua University identified the 
interaction between two non-structural proteins, nsp7 and 
nsp8, by GST pulldown experiments. From the subsequent 
determination of the crystal structure of the nsp7—nsp8 
protein-protein complex, eight copies of nsp7 and eight 
copies of nsp8 were observed to form an intricate hollow 
cylindrical scaffold (Fig. 3A) [22]. The inner dimensions 
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Fig. 3 SARS-CoV protein-protein complexes. (A) The structure of 
the nsp7—nsp8 supercomplex. The complex assembly is formed by 
eight copies of nsp7 and nsp8. Nsp8 exists simultaneously in two 
conformations, termed nsp8I and nsp8II. Nsp7, nsp8I and nsp8II are 
shown in ribbon representation (top) and colored blue, green and 
orange respectively. The complex (below) is assembled from two 
tetramers: T1, formed between nsp7 and nsp8I (center, left); and T2, 
formed between nsp7 and nsp8II (center, right). A surface represen- 
tation showing the charge distribution is also shown (below right), 


and electrostatic properties of the cylindrical nsp7—nsp8 
structure enable it to encircle nucleic acid, and an interac- 
tion was demonstrated with dsRNA by EMSA and 
mutagenesis. The architecture and electrostatic properties 
are reminiscent of PCNA or the f-subunit ring, the pro- 
cessivity factors of DNA polymerase, leading us to 
postulate that the nsp7—nsp8 complex should be a proces- 
sivity factor for the RNA-dependent RNA polymerase 
(nsp12). Interestingly, both nsp7 and nsp8 were found to 
possess novel folds: nsp7 is an «-helical bundle, while nsp& 
has a so-called ‘golf club’ fold with an N-terminal «-helical 
‘shaft’ domain and a C-terminal mixed «/f ‘head’ domain 
(Fig. 1). Within the complex framework, nsp8 exists 
simultaneously in two conformations: one with an extended 
a-helical ‘shaft’ domain, and the other with a bent ‘shaft’ 
domain. The solution structure of nsp7 alone, also deter- 
mined in 2005 by the Scripps consortium, adopts the same 
a-helical bundle observed in the crystal structure [23]. 

In a follow-up study by Imbert and colleagues from the 
French consortium [24], it was reported that nsp8 consti- 
tutes a second RNA-dependent RNA polymerase (RdRp) in 
addition to nsp12, which includes an RdRp domain con- 
served in all RNA viruses. Distant structural homology was 
found between nsp8 and the catalytic palm subdomain of 
RNA virus RdRps. Further activity assays confirmed that 
nsp8 recognizes specific short sequences in the ssRNA 
coronavirus genome to catalyze the synthesis of <6 
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with positive charge colored in blue and negative charge colored in 
red. The positive charge distributed around the central channel of the 
nsp7-nsp8 complex is favourable for the passage of RNA. (B) The 
SARS-CoV S-protein receptor binding domain complexed with the 
receptor ACE2. The complex structure is shown in ribbon represen- 
tation with the ACE2 receptor colored in green, the S-protein receptor 
binding domain (RBD) colored in blue and the S-protein receptor 
binding motif (RBM) colored in red 


nucleotides with low fidelity. The properties of nsp8 indi- 
cate that it most likely functions as a primase to catalyze 
the synthesis of RNA primers for the primer-dependent 
nsp12, which is a unique characteristic of coronaviruses. It 
is worth noting that nsp8 alone can form a complex in 
solution and possesses similar activity to the nsp7—nsp8 
complex, but has poor thermal stability as predicted from 
our crystal structure. Nsp7 therefore serves as ‘mortar’ to 
stabilize the nsp8 scaffold. 


Nsp9, a single-stranded RNA binding protein 


Crystal structures of nsp9 were determined simultaneously 
in 2004 by the French consortium (to 2.7 A resolution) [25] 
and by the SPINE consortium (to 2.8 A resolution) [26], 
and established its previously unknown function as a sin- 
gle-stranded RNA binding protein whose biological unit is 
a dimer (Fig. 2B). The core structure of the protein is an 
open 6-stranded f-barrel reminiscent of, yet unrelated to, 
the nucleic acid binding OB (oligosaccharide/oligonu- 
cleotide binding) fold (Fig. 1). Searches for structural 
homology revealed that nsp9 shares similarity with certain 
subdomains of serine proteases, including domain II of the 
SARS-CoV MP". Based on the similarity to the picorna- 
virus 3C proteases, which feature a conserved RNA 
binding motif, it was inferred that nsp9 should bind ssRNA, 
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and subsequently confirmed by EMSA assay and surface 
plasmon resonance. One role of nsp9 may be to stabilize 
nascent and template RNA strands during replication and 
transcription and protect them against nuclease processing. 
Besides replication, it is believed that nsp9 may also be 
involved in base-pairing driven processes such as RNA 
processing. 


Nsp10, a novel zinc-finger protein 


An international collaborative effort between the Chinese 
and American groups led to the determination of SARS- 
CoV nsp10 as both a dodecamer [27] and monomer [28], 
respectively. The monomer structure, possessing a novel 
fold, contains two zinc-fingers with the sequence motifs C— 
(X)2-C-(X)s-H-(X)6-C and = C-(X)—C-(X)7-C-(X)-C 
(Fig. 1). These zinc finger motifs are strictly conserved 
among the three coronavirus antigenic groups, implying an 
essential function for nsp10 in all coronaviruses. A PRAM 
search identified a match for nsp10 with the HIT-type zinc 
finger proteins, which had previously not been structurally 
characterized. While zinc finger proteins often play a role 
in transcription, the precise function of nsp10 in the viral 
life cycle remains to be determined. Nsp10 is located next 
to nsp8 and nsp9 in the SARS-CoV genome; both nsp8 and 
nsp9 are known to interact with RNA, and nsp10 features a 
large patch of positive charge distributed on its surface, all 
of which suggest that nspl0 should also interact with 
nucleic acid. However, our experiments and those of 
Joseph and colleagues found only weak affinity between 
nsp10 and both ssRNA and dsRNA. Further work is also 
needed to ascertain the significance of the oligomeric state 
of SARS-CoV nsp10 (Fig. 2C). The monomer structure has 
an intact second zinc-finger which appears to stabilize the 
C-terminal tail of nspl0. However, in the dodecamer 
structure, the second zinc-finger lacks the last cysteine 
residue and the remainder of the C-terminal tail is 
disordered. 


Nsp15, an endoribonuclease 


The crystal structures of nsp15 have been determined from 
SARS-CoV by the French consortium [29] and mouse 
hepatitis virus (MHV) by the Chinese consortium [30]. 
Both SARS-CoV and MHV belong to the antigenic group 
II of the genus Coronavirus. The function of nsp15 is an 
XendoU ribonuclease and the active biological unit is a 
hexamer (Fig. 2D). Nsp15 has a novel fold and is the first 
member of the XendoU family of endoribonucleases to be 
characterized, providing the first structural and mechanistic 
characteristics for this family of enzymes. It also represents 


the first crystal structure of an endoribonuclease from the 
genus Coronavirus. The nsp15 monomer structure consists 
of three subdomains: a small N-terminal formed by two a- 
helices packed against a three stranded f-sheet; a middle 
domain comprising of a mixed f-sheet, two smaller f- 
sheets and two short a-helices; and a C-terminal domain 
made up of two f-sheets and five a-helices. Each of the 
three subdomains in turn has a novel fold (Fig. 1). 

Only the hexameric form of nsp15 is known to bind 
RNA, and the affinity of interaction can be increased by 
Mn** ions. The US consortium recently determined the 
crystal structure of SARS-CoV nsp15 in a shortened 
monomeric form as a means of understanding the rela- 
tionship between hexamer formation and activity (P. Kuhn, 
personal communication). In the absence of monomer- 
monomer interactions, the catalytic loop of nsp15 flips 
back to occupy the active site cleft. Given the critical im- 
portantance of nsp15 in the viral life cycle, it is therefore an 
attractive target for anti-viral drug design. Strategies for 
inhibitor design therefore include the design of active site 
inhibitors, non-peptidyl compounds that mimic the cata- 
lytic loop of nsp15, and compounds that disrupt formation 
of the hexamer species. 


Structural proteins 


The SARS-CoV genome encodes four structural proteins 
that are required to drive cytoplasmic viral assembly: the 
spike (S) protein, the membrane (M) protein, the nucleo- 
capsid (N) protein and the envelope (E) protein. The S- 
protein is mainly responsible for binding to the host cell 
and for subsequent cell entry by virus-cell membrane 
fusion. We will focus on the S-protein and N-protein, 
whose partial structures have been solved. 


SARS-CoV spike protein fusion core 


Similar to other class I virus fusion proteins, the SARS- 
CoV S-protein can be divided into an N-terminal half (S1) 
and C-terminal half (S2), but without proteolytic cleavage 
[31]. Sl and S2 are individually responsible for variations 
in host range and tissue tropism by its receptor specificity 
and cell entry by virus-cell membrane fusion [32]. SI is 
responsible for binding to cellular receptors, and one 
potential SARS-CoV receptor has been identified as 
angiotensin-converting enzyme 2 (ACE2) [33]. S2 contains 
an internal fusion peptide and has two hydrophobic (hep- 
tad) repeat regions designated HRI and HR2 [34]. The 
putative fusion peptide has recently been identified 
upstream close to HR1 [35]. HR2 is located close to the 
transmembrane region some 170 amino acids (aa) 
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downstream of HR1 [34]. Don Wiley and colleagues first 
established the classical mechanism of class I fusion pro- 
teins for mediating enveloped virus and_host-cell 
membrane fusion from their comprehensive study of 
influenza hemagglutinin (HA) [36, 37]. In subsequent 
years, a common fusion mechanism has been established 
from extensive structural studies on the viral families of 
orthomyxovirus, retrovirus, paramyxovirus, and filovirus 
[36]. 

In 2004, the spike (S) protein fusion core was deter- 
mined by two groups in the postfusion (or fusion-active) 
state, albeit by employing slightly different strategies [31, 
38]. The Chinese structural proteomics initiative utilized a 
single chain by engineering a linker between the HR1 and 
HR2 domains to prepare the fusion core (HR1: 900-948, 
HR2: 1145-1184), while Supekar and colleagues individ- 
ually synthesized longer HR1 and HR2 peptides (HRI: 
889-972, HR2: 1142-1185). Both structures exhibit a six- 
helix bundle in which three HR1 helices form a central 
coiled-coil surrounded by three HR2 helices in an oblique, 
antiparallel manner (Figs. 1, 2E). HR2 peptides pack into 
the hydrophobic grooves of the HRI trimer in a mixed 
extended and helical conformation, representing a stable 
postfusion structure similar to that for HIV-1 gp41 [36]. 
The N-terminus of HR1 and the C terminus of HR2 are 
located at the same end of the six-helix bundle, which 
would place the fusion peptide and transmembrane region 
close together. Supekar and colleagues also provided a 
structure of S2 fragment consisting of a smaller peptide of 
HR1 (919-949) and a peptide of HR2 (1149-1193) with 
extra C-terminal residues in proximity to the transmem- 
brane region [31]. The C-terminal part is «-helical and 
points away from the HR1 trimer axis, probably resulting 
from the lack of stabilization by the corresponding HR1 
region, and may mimic the conformation of this region 
before the formation of the final postfusion hairpins. A later 
structure reported by Duquerroy and colleagues (HRI: 
890-973, HR2: 1145-1190) emphasized the hydrogen- 
bonding network formed by conserved asparagine and 
glutamine, together with two possible chlorides, which 
could stabilize the conformation of postfusion hairpin [39]. 
Fusogenic mechanisms mediated by SARS-CoV were 
proposed according to those of other class I fusion proteins, 
although the possible conformational changes of the HR1 
and HR2 fusion peptides during the membrane fusion 
process need further structural studies in the native state of 
S-protein and the pre-hairpin intermediate probably 
resulting from S1 binding to a receptor (e.g. ACE2). 

Several peptides derived from HRI and HR2 regions of 
SARS-CoV spike proteins have been demonstrated to 
block viral entry by targeting the putative pre-hairpin 
intermediate [40, 41, 42]. For instance, peptides derived 
from HR2, and not from HRI1, are sufficient to inhibit 
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SARS-CoV infection [40, 41]. Interestingly, the efficacy of 
HR2 peptides derived from the SARS-CoV spike protein is 
lower than those of corresponding HR2 peptides of MHV 
in inhibiting MHV infection [40]. This might be explained 
by the larger surface area buried in the HR1—HR2 interface 
of MHV S2 than in SARS-CoV S82, this resulting in a 
higher affinity of the MHV peptides for the corresponding 
HR1 trimer [40], since a larger surface area is buried by the 
MHV S2 HRI1-HR2 interface than by the SARS-CoV S2 
[31]. In any case, the availability HR1—HR2 fusion core 
structure will help in the discovery of viral entry inhibitors 
against SARS. 


SARS-CoV spike protein receptor binding domain 


An important part of the structure—function studies of any 
virus is to characterize its interaction with possible host 
cellular receptors. In the case of SARS-CoV, one known 
cellular receptor is ACE2 [33]. In 2005, Stephen Harrison 
and colleagues determined the structure of the SARS-CoV 
S-protein receptor-binding domain (RBD, covering resi- 
dues 318 to 510 of the S-protein) with the ACE2 receptor 
(Fig. 1) [43]. The RBD is the critical determinant of virus- 
receptor interaction and thus of viral host range and 
tropism. 

The specific recognition of ACE2 by the SARS-CoV 
RBD occurs through surface complementarity (Fig. 3B). 
The interface between the RBD and the ACE2 receptor is 
well defined, while the opposite face of the RBD, which 
would interact with the rest of the spike protein, is more 
disordered. As revealed by the authors, the interface 
between the two proteins shows important residue changes 
that facilitate efficient cross-species infection and human- 
to-human transmission. ACE2 is highly conserved in 
mammals and birds, and its receptor activity for SARS- 
CoV can be markedly affected by only a few amino acid 
substitutions at the virus binding site. Subtle changes in the 
RBD residues at positions 479 and 487 in human coro- 
naviruses can increase affinity for human ACE2. Palm 
civet coronaviruses have lysine in position 479 and serine 
in position 487, which reduce affinity for human but not 
palm civet ACE2. The authors further suggest ways to 
make truncated disulfide-stabilised RBD variants for use in 
the design of coronavirus vaccines. 


SARS-CoV nucleocapsid protein RNA binding domain 


Specific packaging of the viral genome into the virion is a 
critical step in the life cycle of an infectious virus. The 
nucleocapsid protein (N-protein) plays an important role by 
binding to the genomic RNA via a leader sequence, 


Structural proteomics of the SARS coronavirus: a model response to emerging infectious diseases 93 


recognizing a stretch of RNA that serves as a packaging 
signal and leading to the formation of the helical ribonu- 
cleoprotein (RNP) complex during assembly. The structure 
of the RNA binding domain from the SARS-CoV N-pro- 
tein, consisting of a five-stranded /-sheet whose fold is 
unrelated to other RNA binding proteins, has been deter- 
mined by NMR (covering residues 45-181) [44] and two 
X-ray crystallographic studies (covering residues 45-175) 
(Fig. 1) [45]. The authors of the NMR study identified a 
binding site for single stranded RNA (ssRNA) using NMR 
to determine the resonance of residues perturbed by the 
addition of RNA. The RNA binding groove in the N-ter- 
minal domain of the N-protein is shallow and should be 
able to bind both single- and double-stranded RNA in 
infected cells. The structure of the N-protein RNA binding 
domain exhibits a similar mode of interaction with RNA 
binding proteins such as ULA RNP. The more recent X-ray 
crystal structures of the N-terminal RNA binding domain 
of the N-protein are similar overall to the NMR structure 
and to two structures from avian infectious bronchitis virus 
(BV) [46], a group II coronavirus. It was suggested that 
the SARS-CoV and IBV structures imply a common mode 
of RNA recognition, but homology modelling predicts this 
is not necessarily the case for related coronavirus N-pro- 
teins. The discovery of small molecules that bind to the 
RNA binding domain, as identified from an NMR-based 
screen by Huang and colleagues, might impair the function 
of the nucleocapsid [44]. 


SARS-CoV nucleocapsid protein dimerization domain 


The full-length N-protein is known to form a dimer in 
solution via its C-terminal domain. A crystal structure of 
this so-called dimerization domain, covering residues 270- 
370, was reported in 2006 (Fig. 1) [47]. The structure was 
determined as a dimer and featured extensive interactions 
between the two protomers, consistent with the dimeric 
nature of the full-length protein (Fig. 2F). Sequence 
alignments suggest that the core dimerization domain is 
conserved among the three coronavirus antigenic groups. A 
DALI search for structural similarity did not yield any 
results, but nevertheless the authors found common struc- 
tural features shared by the nucleocapsid protein of an 
arterivirus, porcine reproductive and respiratory syndrome 
virus (PRRSV). The coronaviruses and arteriviruses both 
belong to the Nidovirales and, from a structural basis, it is 
suggested that they are evolutionarily linked. From a 
functional aspect, the structure of the N-protein dimeriza- 
tion domain helps to explain the self-association of the N- 
protein to form a large helical nucleocapsid core. Dimer- 
ization is believed to bring the N-terminal RNA binding 
domains of N proteins into close proximity, thus enabling 


them to interact with the viral RNA and effectively pack- 
age the large viral genome into the virion. 

It is also worth noting that antigenic peptides of the 
coronavirus N-protein can be recognized on the surface of 
infected cells by T cells [48, 49]. The structure of the 
MHC-I molecule HLA-A*1101 in complex with such a 
peptide derived from the SARS-CoV N-protein, a nonamer 
with SARS specific sequence, was determined to 1.45 A 
resolution in 2005 [50]. Although it is similar with other 
MHC-I molecules and shows a similar peptide binding 
mode, the structure adds to the growing library of MHC-I 
structures and could be used as a template for peptide- 
based vaccine design. 


Supramolecular architecture of S, M and N structural 
proteins 


While not strictly part of the structural proteomics remit, it 
is worth including the 2006 work by the Scripps consor- 
tium using cryo-electron microscopy to study the 
supramolecular architecture of the S, M and N structural 
proteins [51]. Their resulting model shows interactions 
between S—M, M—M and M-N near the viral membrane in 
accord with previous observations. Proteins located close 
to the viral membrane are arranged in overlapping lattices 
and surrounding a disordered core. The trimeric glyco- 
protein spikes appear to be in register with densities for 
four underlying ribonucleoproteins. The spikes were dis- 
pensable for ribonucleoprotein lattice formation, and 
ribonucleoprotein particles exhibited coiled shapes fol- 
lowing release from the viral membrane. The overall 
results suggest that lattice formation by structural proteins 
is integral to coronavirus budding. 


Accessory proteins 


In addition to the structural and non-structural proteins, the 
SARS-CoV genome encodes a further eight so-called 
“accessory” proteins unique to this coronavirus. Viruses 
frequently make use of alternative open reading frames to 
achieve greater output from their limited genomes. Out-of- 
frame translation is initiated from a start codon within an 
existing gene and results in a distinct protein product. 
These accessory proteins are poorly characterized struc- 
turally and their functions are largely unknown. They are 
believed to be unimportant in tissue culture but may pro- 
vide the virus with a selective advantage in the infected 
host. The structures of two accessory proteins have been 
determined to date: orf7a and orf9b. 

The crystal structure of the SARS-CoV orf7a luminal 
domain was reported in 1995 by Nelson and colleagues 
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[52]. At the time, significant progress had been made in 
understanding the structure—function relationships of 
SARS-CoV proteins with essential replication or structural 
roles. However, the functions of the accessory proteins 
which are coronavirus group-specific were poorly under- 
stood. The structure of the first accessory protein from 
SARS-CoV therefore provided important new information. 
The orf7a luminal domain is an all-f structure comprising 
seven f-strands in two f-sheets (Fig. 1). Fold assignment 
indicates the orf7a luminal domain is similar to I-set Ig 
proteins and places it as a member of the Ig superfamily, 
despite low sequence identity with other Ig-like proteins. 
The function of Ig-like proteins is diverse, but subcellular 
localization experiments confirm that orf7a is expressed 
and retained intracellularly. Furthermore, the short cyto- 
plasmic tail and transmembrane domain are implicated in 
trafficking orf7a in the endoplasmic reticulum and Golgi 
network. It follows that possible functions of orf7a might 
include roles in viral assembly or SARS-specific budding 
events, or as a secondary attachment protein within the 
virion analogous to the hemagglutinin-esterase (HE) 
protein. 

The SARS-CoV orf9b crystal structure, a new fold, was 
solved by the SPINE consortium [53]. It has a dimeric f 
structure with an amphipathic surface and a central 
hydrophobic tunnel which is confirmed to bind lipid mol- 
ecules (Fig. 1). SARS-CoV orf9b most likely involves in 
membrane attachment and further functional studies con- 
firmed that orf9b associates with intracellular vesicles in 
mammalian cells. The authors propose that SARS-CoV 
orf9b may interact with compartments of the ER-Golgi 
network to act as an accessory protein during the assembly 
of the SARS virion. 


Biological implications of SARS-CoV structural 
proteomics 


Since the emergence of SARS in 2003, a substantial 
number of full-length SARS-CoV proteins or functional 
domains have been determined by X-ray crystallography or 
NMR. Structures are now available for half of the 16 non- 
structural proteins involved in viral replication and tran- 
scription, providing us with a much greater understanding 
of the inner workings of this large and sophisticated 
machinery. The three SARS-CoV structural proteomics 
initiatives operate independently but there is good com- 
munication and co-operation between them, and overlaps 
are generally avoided even when groups are working on the 
same protein targets. For instance, the Chinese and 
American initiatives joined forces in 2006 to report the 
structure of SARS-CoV nsp10 [28, 27]; the Chinese group 
reported an nsp10 dodecamer structure while the American 
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group reported the monomer structure. In the case of 
nsp15, the French group reported the structure of the active 
hexameric form from SARS-CoV [29]; the Chinese group 
reported the active hexameric form of nsp15 from MHV 
[30]; and the American group reported a shortened and 
inactive monomeric form of nsp15 from SARS-CoV (P. 
Kuhn, personal communication). The different perspectives 
offered by the three structural proteomics initiatives can 
provide deeper, more penetrating insights into the struc- 
ture—function relationships of SARS-CoV proteins. 

One interesting and significant outcome of the SARS- 
CoV structural proteomics initiatives is the prevalence of 
new protein folds. Remarkably, of the 16 SARS-CoV 
proteins or functional domains with known structure to 
date, eight of them possess new folds, representing a fold 
discovery of about 50% (Fig. 1). This is in contrast to 
current estimates which put the discovery of new folds by 
structural genomics efforts targeting other organisms at 
somewhere between 5 and 7%. The overall rate of fold 
discovery is currently estimated at around 10%. This is 
perhaps not surprising as viruses are the most biodiverse of 
all biological entities. One of the principal aims of struc- 
tural genomics is completion of the protein fold space, and 
in this regard the SARS-CoV structural proteomics initia- 
tives have been successful. The addition of new folds to the 
Protein Data Bank should improve understanding of the 
structure—function relationships of several new families of 
proteins. 

At the time of the 2003 outbreak, there were no thera- 
peutic agents against SARS-CoV or indeed against any 
other coronavirus. Coronavirus research up to that point 
had been limited, largely due to the lack of medical or 
economic incentives as human coronaviruses were con- 
sidered relatively harmless. Until the emergence of SARS, 
coronaviruses had been known to cause predominantly 
severe diseases in animals and only comparatively mild 
diseases in humans. Coronaviruses account for a significant 
percentage of upper and lower respiratory tract infections 
in humans, including common colds, bronchiolitis and 
pneumonia, and are also implicated in otitis media, exac- 
erbations of asthma, diarrhoea, myocarditis and 
neurological disease [54—56, 57, 58, 59]. Anti-coronavirus 
drug discovery strategies to date have generally been 
focused in two main areas: blocking viral entry into the 
host cell, or inhibiting viral replication and transcription. In 
the case of the former, the availability of SARS-CoV spike 
protein fusion core structures will enable the design of 
inhibitors that block viral entry by targeting the pre-fusion 
hairpin intermediate [60]. In the latter case, three major 
conserved targets have been identified among the SARS 
non-structural proteins: nsp5, the main protease; nsp12, 
the RNA dependent RNA polymerase; and the RNA heli- 
case [21]. 
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While SARS was brought under control by effective 
global public health measures and is no longer in circula- 
tion among humans, there is still a possibility that it could 
re-emerge. The recent discovery of animal reservoirs for a 
SARS-like coronavirus has prompted new public health 
fears [61, 62]. Furthermore, the human coronaviruses 
HCoV-NL63 and HCoV-HKUI were identified in the wake 
of SARS [58, 59]. Several key factors controlling the host 
spectrum and viral pathogenicity are highly variable among 
CoVs, including the requirement of different host receptors 
for cellular entry, poorly conserved structural proteins 
(antigens), and diverse accessory genes in their 3’-terminal 
genome regions that most likely contribute to the patho- 
genicity of CoVs in specific hosts [63, 64, 8, 65, 33, 6, 7, 
59]. This structural and functional diversity presents a 
significant obstacle for the design of a versatile compound 
against all CoVs. For instance, a fusion peptide inhibitor 
derived from the MHV spike protein cannot prevent SARS- 
CoV replication in cell culture [40]. Identification of con- 
served structural targets among the coronaviruses will 
provide an opportunity for the development of broad- 
spectrum inhibitors against all CoV-related diseases. 


Concluding remarks 


The emergence of SARS in 2003 had a particularly dev- 
astating impact, both to human health and to the global 
economy, and demonstrated how rapidly viruses can spread 
around the world. The outbreak also provided a stark 
warning of how ill-prepared we were at the time against a 
newly emerging infectious disease such as SARS. The 
paucity of available scientific data for coronaviruses was a 
considerable disadvantage, but scientists mounted a rapid 
international response to the threat of SARS. For instance 
the SARS coronavirus was quickly identified and its gen- 
ome was sequenced within weeks [1, 2, 3, 4]. Ultimately, 
however, the disease was only brought under control by 
effective public health control measures. Since then, con- 
siderable efforts have been made by researchers around the 
world to understand the origins of the virus, its inner 
workings and its interaction with host cells. 

The accumulated structural and functional data from the 
SARS-CoV structural proteomics initiatives will have 
many obvious benefits. First, the available structural 
information will provide a starting point for understanding 
important viral mechanisms. Specifically, the structures of 
the non-structural proteins will help elucidate their func- 
tions, many of which were previously unknown, and 
provide a vital starting point for understanding the unique 
and complex mechanism of coronavirus replication and 
transcription. Second, the new fold information provided 
by SARS-CoV structures will aid the understanding of the 


structure-function relationships of several new protein 
families. Third, the availability of SARS-CoV structures 
provides targets for the structure-based discovery of anti- 
viral compounds for therapeutic intervention. In the event 
of another emerging coronavirus, a stockpile of anti-coro- 
naviral agents could provide an effective first line of 
defence. 

Regarding the future prospects of SARS-CoV structural 
proteomics, significant challenges still lie ahead. All of 
the structural proteomics initiatives have experienced 
difficulties in expressing stable and functional SARS-CoV 
proteins. Furthermore, the SARS-CoV proteins that 
remain to be structurally characterized include several 
membrane proteins. While some progress has been made 
towards understanding the functions of the various SARS- 
CoV proteins, there is still a long way to go towards 
discovering how the proteins interact with each other, 
with the viral RNA and with host proteins. The complex 
structure of the S-protein RBD with the cellular receptor 
ACE2 is a significant step towards understanding the 
mechanisms of host recognition [43]. For the replicase 
proteins, we are slowly learning how they interact with 
one another within the replication machinery. Our group 
has already determined the complex structure between 
nsp7 and nsp8 [22]. In addition to their nsp9 structure, 
Sutton and colleagues showed evidence for its interaction 
with nsp8 [26]. Furthermore, dual-labeling studies of 
SARS-CoV replicase proteins have demonstrated co- 
localization of nsp8 with nsp2 and nsp3 [5]. The available 
three-dimensional structures of nsp7, nsp8 and nsp9 pro- 
vide a starting point to reveal the architecture and 
underlying functions of the _ replication/transcription 
complex. 
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